Search CORE

9 research outputs found

Deep Exploration for Recommendation Systems

Author: Van Roy Benjamin
Zhu Zheqing
Publication venue
Publication date: 19/07/2023
Field of study

Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Such work, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms

arXiv.org e-Print Archive

Scalable Neural Contextual Bandit for Recommender Systems

Author: Van Roy Benjamin
Zhu Zheqing
Publication venue
Publication date: 30/07/2023
Field of study

High-quality recommender systems ought to deliver both innovative and relevant content through effective and exploratory interactions with users. Yet, supervised learning-based neural networks, which form the backbone of many existing recommender systems, only leverage recognized user interests, falling short when it comes to efficiently uncovering unknown user preferences. While there has been some progress with neural contextual bandit algorithms towards enabling online exploration through neural networks, their onerous computational demands hinder widespread adoption in real-world recommender systems. In this work, we propose a scalable sample-efficient neural contextual bandit algorithm for recommender systems. To do this, we design an epistemic neural network architecture, Epistemic Neural Recommendation (ENR), that enables Thompson sampling at a large scale. In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves equivalent performance with at least 29% fewer user interactions compared to the best-performing baseline algorithm. Remarkably, while accomplishing these improvements, ENR demands orders of magnitude fewer computational resources than neural contextual bandit baseline algorithms

arXiv.org e-Print Archive

Optimism Based Exploration in Large-Scale Recommender Systems

Author: Guo Hongbo
Naeff Ruben
Nikulkov Alex
Zhu Zheqing
Publication venue
Publication date: 05/04/2023
Field of study

Bandit learning algorithms have been an increasingly popular design choice for recommender systems. Despite the strong interest in bandit learning from the community, there remains multiple bottlenecks that prevent many bandit learning approaches from productionalization. Two of the most important bottlenecks are scaling to multi-task and A/B testing. Classic bandit algorithms, especially those leveraging contextual information, often requires reward for uncertainty estimation, which hinders their adoptions in multi-task recommender systems. Moreover, different from supervised learning algorithms, bandit learning algorithms emphasize greatly on the data collection process through their explorative nature. Such explorative behavior induces unfair evaluation for bandit learning agents in a classic A/B test setting. In this work, we present a novel design of production bandit learning life-cycle for recommender systems, along with a novel set of metrics to measure their efficiency in user exploration. We show through large-scale production recommender system experiments and in-depth analysis that our bandit agent design improves personalization for the production recommender system and our experiment design fairly evaluates the performance of bandit learning algorithms

arXiv.org e-Print Archive

Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning

Author: Bhandari Jalaj
He Yuchen
Korenkevych Dmytro
Liu Fan
Nikulkov Alex
Xu Ruiyang
Zhu Zheqing
Publication venue
Publication date: 30/07/2023
Field of study

Auction-based recommender systems are prevalent in online advertising platforms, but they are typically optimized to allocate recommendation slots based on immediate expected return metrics, neglecting the downstream effects of recommendations on user behavior. In this study, we employ reinforcement learning to optimize for long-term return metrics in an auction-based recommender system. Utilizing temporal difference learning, a fundamental reinforcement learning algorithm, we implement an one-step policy improvement approach that biases the system towards recommendations with higher long-term user engagement metrics. This optimizes value over long horizons while maintaining compatibility with the auction framework. Our approach is grounded in dynamic programming ideas which show that our method provably improves upon the existing auction-based base policy. Through an online A/B test conducted on an auction-based recommender system which handles billions of impressions and users daily, we empirically establish that our proposed method outperforms the current production system in terms of long-term user engagement metrics

arXiv.org e-Print Archive

Dietary inflammatory potential mediated gut microbiota and metabolite alterations in Crohn's disease:A fire-new perspective

Author: Chen Minhu
Feng Rui
Hu Shixian
Li Xiaozhi
Ma Ruiqi
Tang Ce
Tian Zhenyi
Zeng Zhirong
Zhang Zheqing
Zhao Min
Zhu Yijun
Zhuang Xiaojun
Zhuo Shuyu
Publication venue: 'Elsevier BV'
Publication date: 01/06/2022
Field of study

Background & aims: Pro-inflammatory diet interacting with gut microbiome might trigger for Crohn's disease (CD). We aimed to investigate the relationship between dietary inflammatory potential and microflora/metabolites change and their link with CD. Methods: The dietary inflammatory potential was assessed using a dietary inflammatory index (DII) based on the Food Frequency Questionnaire from 150 new-onset CD patients and 285 healthy controls (HCs). We selected 41 CD patients and 89 HCs who had not received medication for metagenomic and targeted metabolomic sequencing to profile their gut microbial composition as well as fecal and serum metabolites. DII scores were classified into quartiles to investigate associations among different variables. Results: DII scores of CD patients were significantly higher than HCs (0.56 ± 1.20 vs 0.23 ± 1.02, p = 0.017). With adjustment for confounders, a higher DII score was significantly associated with higher risk of CD (OR: 1.420; 95% CI: 1.049, 1.923, p = 0.023). DII score also was positively correlated with disease activity (p = 0.001). Morganella morganii and Veillonella parvula were increased while Coprococcus eutactus was decreased in the pro-inflammatory diets group, as well as in CD. DII-related bacteria were associated with disease activity and inflammatory markers in CD patients. Among the metabolic change, pro-inflammatory diet induced metabolites change were largely involved in amino acid metabolic pathways that were also observed in CD. Conclusions: Pro-inflammatory diet might be associated with increased risk and disease activity of CD. Diet with high DII potentially involves in CD by mediating alterations in gut microbiota and metabolites

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning

Author: Aggarwal Vaneet
Chen Chang-Lin
Chen Jiayu
Dong Hongbo
Gasser Tim
Kumar Neeraj
Lan Tian
Menon Vijay
Pedramfar Mohammad
Ruiz Pol Mauri
Zhou Chi
Zhou Hanhan
Zhu Zheqing
Publication venue
Publication date: 29/06/2023
Field of study

This paper addresses the important need for advanced techniques in continuously allocating workloads on shared infrastructures in data centers, a problem arising due to the growing popularity and scale of cloud computing. It particularly emphasizes the scarcity of research ensuring guaranteed capacity in capacity reservations during large-scale failures. To tackle these issues, the paper presents scalable solutions for resource management. It builds on the prior establishment of capacity reservation in cluster management systems and the two-level resource allocation problem addressed by the Resource Allowance System (RAS). Recognizing the limitations of Mixed Integer Linear Programming (MILP) for server assignment in a dynamic environment, this paper proposes the use of Deep Reinforcement Learning (DRL), which has been successful in achieving long-term optimal results for time-varying systems. A novel two-level design that utilizes a DRL-based algorithm is introduced to solve optimal server-to-reservation assignment, taking into account of fault tolerance, server movement minimization, and network affinity requirements due to the impracticality of directly applying DRL algorithms to large-scale instances with millions of decision variables. The paper explores the interconnection of these levels and the benefits of such an approach for achieving long-term optimal results in the context of large-scale cloud systems. We further show in the experiment section that our two-level DRL approach outperforms the MIP solver and heuristic approaches and exhibits significantly reduced computation time compared to the MIP solver. Specifically, our two-level DRL approach performs 15% better than the MIP solver on minimizing the overall cost. Also, it uses only 26 seconds to execute 30 rounds of decision making, while the MIP solver needs nearly an hour

arXiv.org e-Print Archive

Effects of granular temperature on inter-phase drag in gas-solid flows

Author: Beetstra
Biggs
Brenner
Breugem
Campbell
Di Felice
Drew
Fullmer
Garzó
Gopalan
Hadinoto
Hill
Hill
Huizhi Wang
Koch
Koch
Kriebitzsch
Li
Liu
Luo
Luo
Mehrabadi
Müller
Qiang Zhou
Rice
Tang
Tang
Tang
Tartan
Tenneti
Tenneti
Tenneti
Tingwen Li
Van Der Hoef
Wachs
Wang
Wylie
Zhang
Zheqing Huang
Zhou
Zhou
Zhou
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Optimization of an “area to point” heat conduction problem

Author: Almogbel
Bejan
Bejan
Bejan
Bejan
Boichot
Boichot
Botan Liu
Chen
Chen
Chen
Cheng
Chunjiang Liu
Ghodoossi
Guangming Zhu
Guo
Guo
Guo
Guo
Guo
Ishikawa
Jankowski
Jia
Kai Guo
Ledezma
Ledezma
Lerou
Li
Lorenzini
Meng
Neagu
Neagu
Ogulata
Ordonez
Ratts
Rocha
Tarlet
Tondeur
Wei
Wenzhe Qi
Wu
Xia
Xu
Zhang
Zheqing Huang
Zhou
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref